在选择组套索(或普遍的变体,例如重叠,稀疏或标准化的组套索)之后,在没有选择偏见的调整的情况下,对所选参数的推断是不可靠的。在受惩罚的高斯回归设置中,现有方法为选择事件提供了调整,这些事件可以表示为数据变量中的线性不平等。然而,这种表示未能与组套索一起选择,并实质上阻碍了随后的选择后推断的范围。推论兴趣的关键问题 - 例如,推断选定变量对结果的影响 - 仍未得到解答。在本文中,我们开发了一种一致的,选择性的贝叶斯方法,通过得出似然调整因子和近似值来解决现有差距,从而消除了组中的偏见。对模拟数据和人类Connectome项目数据的实验表明,我们的方法恢复了所选组中参数的影响,同时仅支付较小的偏差调整价格。
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译
Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status. Here, we undertake a large scale study of audio-based deep learning classifiers, as part of the UK governments pandemic response. We collect and analyse a dataset of audio recordings from 67,842 individuals with linked metadata, including reverse transcription polymerase chain reaction (PCR) test outcomes, of whom 23,514 tested positive for SARS CoV 2. Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REal-time Assessment of Community Transmission (REACT) randomised surveillance survey. In an unadjusted analysis of our dataset AI classifiers predict SARS-CoV-2 infection status with high accuracy (Receiver Operating Characteristic Area Under the Curve (ROCAUC) 0.846 [0.838, 0.854]) consistent with the findings of previous studies. However, after matching on measured confounders, such as age, gender, and self reported symptoms, our classifiers performance is much weaker (ROC-AUC 0.619 [0.594, 0.644]). Upon quantifying the utility of audio based classifiers in practical settings, we find them to be outperformed by simple predictive scores based on user reported symptoms.
translated by 谷歌翻译
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.
translated by 谷歌翻译
The performance of decision policies and prediction models often deteriorates when applied to environments different from the ones seen during training. To ensure reliable operation, we propose and analyze the stability of a system under distribution shift, which is defined as the smallest change in the underlying environment that causes the system's performance to deteriorate beyond a permissible threshold. In contrast to standard tail risk measures and distributionally robust losses that require the specification of a plausible magnitude of distribution shift, the stability measure is defined in terms of a more intuitive quantity: the level of acceptable performance degradation. We develop a minimax optimal estimator of stability and analyze its convergence rate, which exhibits a fundamental phase shift behavior. Our characterization of the minimax convergence rate shows that evaluating stability against large performance degradation incurs a statistical cost. Empirically, we demonstrate the practical utility of our stability framework by using it to compare system designs on problems where robustness to distribution shift is critical.
translated by 谷歌翻译
Tumor segmentation in histopathology images is often complicated by its composition of different histological subtypes and class imbalance. Oversampling subtypes with low prevalence features is not a satisfactory solution since it eventually leads to overfitting. We propose to create synthetic images with semantically-conditioned deep generative networks and to combine subtype-balanced synthetic images with the original dataset to achieve better segmentation performance. We show the suitability of Generative Adversarial Networks (GANs) and especially diffusion models to create realistic images based on subtype-conditioning for the use case of HER2-stained histopathology. Additionally, we show the capability of diffusion models to conditionally inpaint HER2 tumor areas with modified subtypes. Combining the original dataset with the same amount of diffusion-generated images increased the tumor Dice score from 0.833 to 0.854 and almost halved the variance between the HER2 subtype recalls. These results create the basis for more reliable automatic HER2 analysis with lower performance variance between individual HER2 subtypes.
translated by 谷歌翻译
我们提供了证据表明,学到的密度功能理论(``dft')的力场已准备好进行基态催化剂发现。我们的关键发现是,尽管预测的力与地面真相有很大差异,但使用从超过50 \%的评估系统中使用RPBE功能的能量与使用RPBE功能相似或较低能量的力量的力量与使用RPBE功能相似或较低的力量放松。这具有令人惊讶的含义,即学习的潜力可能已经准备好在挑战性的催化系统中替换DFT,例如在Open Catalyst 2020数据集中发现的电位。此外,我们表明,在局部谐波能量表面上具有与目标DFT能量相同的局部谐波能量表面训练的力场也能够在50 \%的情况下找到较低或相似的能量结构。与在真实能量和力量训练的标准模型相比,这种``简易电位''的收敛步骤更少,这进一步加速了计算。它的成功说明了一个关键:即使模型具有高力误差,学到的电位也可以定位能量最小值。结构优化的主要要求仅仅是学到的电位具有正确的最小值。由于学到的电位与系统大小的速度快速且尺寸为线性,因此我们的结果开辟了快速找到大型系统基础状态的可能性。
translated by 谷歌翻译
机器学习(ML)是一种在车辆互联网(IOV)上培训预测模型的分布式方法,以实现智能公共交通。由于交通状况会随着时间而变化,因此必须连续有效地更新流量流动和乘客等待时间的ML模型。联合学习(FL)是一种分布式机器学习方案,允许车辆接收连续的模型更新,而无需将原始数据上传到云中并等待培训模型。但是,由于车辆在公共场所旅行以来,智能公共交通中FL容易受到中毒或DDOS攻击的影响。此外,由于设备异质性和不平衡数据分布,同步聚合策略在聚集之前从特定车辆中收集本地模型的同步聚合策略效率低下。尽管有异步联合学习(AFL)方案是通过收到本地模型来提高效率的,但陈旧的本地模型仍然不合理地加权,导致学习绩效不佳。为了实现更明智的公共交通,本文提供了一个基于动态缩放系数(DBAFL)的基于区块链的异步联合学习方案。具体而言,基于委员会的新型共识算法用于区块链,以最低的时间成本提高了可靠性。同时,设计的动态缩放系数允许AFL为陈旧的本地模型分配合理的重量。在异质设备上进行的广泛实验验证了DBAFL的学习效果,效率和可靠性优于外观的实验。
translated by 谷歌翻译
我们最近提出了一个以DBM为中心的新群集操作系统堆栈DBO。DBO通过将ML代码封装在存储过程中,集中辅助ML数据,为基础DBMS内置的安全性,共同关注ML代码和数据以及跟踪数据和工作流源来源,从而为ML应用程序提供了独特的支持。在这里,我们在两个ML应用程序附近演示了这些好处的子集。我们首先表明,使用GPU的图像分类和对象检测模型可以用作DBOS存储程序,具有与现有系统竞争性能的DBOS存储程序。然后,我们提出了一项1D CNN,训练有素,可以在DBOS支持的Web服务上检测HTTP请求中的异常情况,从而实现SOTA结果。我们使用此模型来开发交互式异常检测系统,并通过定性用户反馈对其进行评估,并证明了其有用性作为未来工作的概念证明,以在DBO上开发实时的实时安全服务。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译